Using the ABBYY OCR Object
The ABBYY OCR object in Advanced Process Automation enables you to capture text from images using the Optical Recognition Method (OCR). This object was added in APA 7.6. You can capture text from PDFs or images.
This OCR functionality is included in the Connectivity.OCR library.
The functionality was updated in APA 7.7. to add support for:
-
Reading cells in complex, asymmetrical tables.
-
Reading text in various languages.
-
Retrieving a list of screen element rectangles with the specified word.
To test the new functionality, download the sample project file here. For example, this table has a different number of columns per row.
The Get Table function returns a list of rows, the first row has four cells, the second row has eight cells, and so on.
ABBYY OCR Object Functionality
You can review the available functions of the ABBYY OCR object from the Connectivity.OCR library in the Real-Time Designer.
To view the Connectivity.OCR Library functionality:
-
In Real-Time Designer, open the Project tab.
-
Under the References section, expand the Library References node, and select Connectivity.OCR.
-
Open the Functionality tab, and from the Type drop-down list, select ABBYY OCR.
-
Open the Business Entities tab and expand the OCR type.
The following OCR properties are available:
Property |
Description |
---|---|
Current Page |
The current page of the document. |
Number of Pages |
The number of pages in the document. |
The following OCR functions are available:
Function |
Description |
---|---|
Close |
Disconnect the ABBYY OCR object from ABBYY. This is important for licensing purposes. |
Determine Brightness | Determines the screen element rectangle brightness. |
Get Block Text |
Retrieve the text from a specified block in the current page. The first block in the image is numbered 1. All the text in the block is returned as a single text value. |
Get Block Text with Rectangles |
Retrieve a list of screen element rectangles and text from a specified block in the current page. The first block in the image is numbered 1. The list is returned via an instance of the ABBYY OCR Word business entity and includes the screen element rectangles and text. |
Get Current Page Image | Returns an image object of the current page. |
Get Table | Gets a list of the rows and cells (with their values) for the specified table on the current page. |
Get Table Cells | Gets a list of ABBYY OCR Word. |
Get Text |
Retrieve the text from the current page. All the text is returned as a single text value. |
Get Text with Rectangles |
Retrieve a list of the screen element rectangles and text from the current page. The list is returned via an instance of the ABBYY OCR Word business entity and includes the screen element rectangles and text. |
Get Word Rectangles | Retrieves a list of screen element rectangles with the specified word. |
Load from File |
Load the required file. |
Load from Image | Loads an AbbyyOCR object with the image specified. |
Set Handwriting Mode | Sets the handwriting mode: Simple Text, Underlined Text, Text In Frame, Gray Boxes, Char Box Areas, Simple Comb, Comb in Frame, Partitioned Frame. |
Set Languages | Set the languages, for example, Russian, English, Hebrew, German. |